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final report: 

Computer-based Measi;.iement of Intellectual Capabilities 



Objectives 

The objectives of this research program were based on a review of previous 
research literature that Identified the potential of computerized adaptive test- 
ing to reduce at least five kinds of errors In the Beasurenent of human capaci- 



ties: 








I. 


Errors 


due 


to misiMtch of test item difficulty irtth testee ability; 


2. 


Errors 


due 


to the psyctologlcal effects of testing; 


3. 


Errors 


due 


to Inappropriate dimensionality; 


4. 


Errors 


due 


to failure to extract sufficient Information from the testee; 


5* 


Errors 


due 


to over-slmpllstlc conceptualizations of intellectual capablll 




ties. 







within the context of these five sources of error, which act to reduce the pre- 
cision, accuracy and utility of current ability testing procedures, the research 
was designed to: 

1. Extend previous research efforts to Identify the most useful computer-based 
ad.iptlve testing strategies. 

2. Study the psychological effects of computerized adaptive testing, to iden- 
tify those testing conditions which minimize adverse effects and maximize 
positive effects. 

3. Investigate the problem of intra-indlvldual multidiraenslonallty in ability 
testing. 

A. Examine the use of such response modes as probabilistic responding and 

free-response methods for use in computerized adaptive testing in order to 
extract maximum information from each examinee's response to each test 
item. 

5. Develop, refine and evaluate new computer-administered ability tests which 
measure abilities not now nKasurable using paper and pencil ability test- 
ing. 

Research in pursuance of these primary objectives began in September 1975 
and continued through December 1978. A contract extension, funded by the Navy 
Personnel Research and Development Center, was designed to complete a live-test- 
ing validity comparison of adaptive and conventional tests using Marine re- 
cruits. This extension continued the contract through September 1979. Techni- 
cal reports were cOTipleted through January 1983. 
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Approach 

The major focus of the research was on the evaluation Of adaptive testing 
strategies by coB|»arl8on of their characteristics trtth each other and with con- 
ventional tests. Both aonte carlo slmilatlon and live testing were used In 
these studies. In Research Report 75-6 the stradaptlve testing strategy was 
exaalned In aonte carlo slaulatlon to evalaste various scoring techniques possi- 
ble with this testing strategy, under various test lengths srel prior inforaation 
conditions. Performance of the stradaptlve testing stipategy was also evaluated 
in live testing (Research Report 80-3) by comparing its validity with that of a 
conventional test and a Bayesian adaptive test. 

The Bayesian adaptive testing strategy was further studied in several re- 
ports. Monte carlo siroilation was used in Research Report 76-1 to exaslne the 
perforaance of this testing strategy under several ite« pool configurations and 
at a nuBber of tei/t lengths. In Research Reports 80-5 and 83-1, the reliability 
and validity of the Bayesian adaptive test was crapatled with that of convention- 
al tests in a college population (80-5) and in a «llltary recruit population 
(83-1). Research Report 77-4 describes a procedure for iaprovlng the efficiency 
of item selection In Bayesian adaptive testing. 

Several other probleras concerned with the application of adaptive tests to 
the loeasurement of abilities were discussed In a symposiuB presented at the 1976 
ocetln' of the Military Testing Association (Research Report 77-1). An overview 
of adaptive testing strategies, presented by McBridc, included a discussion of 
iteo selection strategies, scoring adaptive tests, and probless of evaluating 
adaptive tests. The problem of estimating trait status in adaptive testing 
based on Item response theory approaches was presented by Sympson, including a 
cOTparlson of the characteristics of Bayesian and likelihood-based estimates. 
Vale, in his paper, considered the problem of classifying individuals into dis- 
crete ability categories (e.g., pass-fall); his monte carlo analysis compared 
adaptive and conventional tests designed for making dlchotomous classifications. 

The effects of testing conditions on test performance were investigated in 
a number of live-testing studies. Since computer-administered testing permits 
immediate scoring of an exsdainee's answer to a test question, it bec(Haes possi- 
ble to Inform the examinee iMwdlately after each response is given as to wheth- 
er the answer was correct or Incorrect. This immediate knowledge of results, or 
immediate feedback, was investigated in several studies in terras of its effects 
on ability test performance In adaptive and conventional tests (Research Reports 
76-3 and 78-2), its Interaction with test difficulty (Research Report 78-2) and 
computer versus self-paced test administration (Research Report 81-2), and its 
effects on examinees* reactions to test administration (Research Reports 76-4 
and 81-2). Related studies examined the effects of time limits on test-taking 
behavior (Research Report 76-2) and the accuracy of the perceived difficulty of 
test Items (Research Report 77-3). 

The question of intra-indlvidual dimensionality in performance on ability 
tests was recast within the more general framework of the fit of individuals to 
item response theory (IRT) models. This issue was examined In one study (Re- 
search Report 79-7) in which the predicted and acutal performance of single in- 
dividuals was examined for indications of lack of person fit due to intra-indl- 
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vldual miltldliMsnslonallty or other factors reflecting non-fit to the unidioen- 
sional IRT aodels. 

The use of test item response nodes other than the oultiple-choice iteo was 
examined In one study (Research Report 77-2) which compared test information 
derived from free-response administration to that of the same items administered 
in multiple-choice mode. 

The use of the unique capability of interactive computers to measure abili- 
ties not measurable by paper-and-pencil tests was examined in one study (Re- 
search Report 80-2). An interactive spatial reasoning test was designed based 
on the popular "15 puzEle" in which examinees were required to restructure a set 
of 15 numerals into a target pattern using a minimum number of moves. Examinee 
performance on the test was analysed in terms of such factors as number of moves 
to solution, quality of the moves, and response latencies at each point in the 
testing procedure. 

Major Findings 

The major findings below are generally organized according to the original 
objectives of the research program. iWldltional details are in the Research Re- 
port abstracts. Many of the original Research Reports contain additional impor- 
tant findigs. 

Adaptive Testing Strategies 

1. Monte carlo data comparing the stradaptlve test with non-adaptive approach- 
es to ability testing (Research Report 75-6) shows that the stradaptlve 
test provides more equipredse neasurement than a peaked conventional test- 
As item discriminations increased, the equl precision of the stradaptlve 
test increased relative to that of the conventional test. 

2. A stradaptlve test with an average of 25X fewer items than a conventional 
test obtained significantly higher validities with a college grade-point 
average criterion than did the conventional test (Research Report 80-3). 

3. Monte carlo evaluation of a Bayeslan adaptive testing strategy identified a 
number of psychometric problems in the ability estimates resulting from 
this testing strategy (Research Report 76-1). Bayeslan ability estimates 
were highly correlated with test length, i#rfVc non-lincarly biased for about 
two-thirds of the ability range, and were dependent on the prior ability 
estimate* 

4. Although the monte carlo simulations of the Bayeslan adaptive test identi- 
fied these potential problems with the Bayeslan ability estimates, they 
appeared to have little impact on the reliability and validity of Bayeslan 
ability estimates. Live-testing studies of the Bayeslan adaptive testing 
strategy in a college population showed validities equal to that a conven- 
tional test (Research Report 80-3), and high reliabilities for tests of 2 
to 30 items in length (Research Report 80-5); in the latter study, hoover, 
using a concurrent validity criterion, the conventional test had higher 
vslidlty correlations than the adaptive test. In a military recruit popu- 
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latlon (Sesearch Beport 83*1), the Bayeslmi adaptive test achieved both 
higher validities and higher reliabilities than did a com|>arabIe conven-' 
tlonal test. In this population, a 9-ltC5B Captive test achieved the sasie 
reliability as a 17-ltem conventional test; 10- to Il-ltea adaptive tests 
achieved the saae concurrent validities as 28- to 30-ltra conventional 
tests. 

5. The original form of the Bayeslan adaptive test used an Item-search proce- 
dure that could require excessive asounts of co^utlng tlae for an Interac* 
tlve test administration envlroiasent. A rapid Ites-search procedure %ias 
developed aid shovn to select tY» sane subset of Iteas as the original pro- 
cedure In about one-tenth tlm arount of coaputer ti«». 

6. Different nethods of estlaatlng ability froa adaptive tests have different 
characteristics. Validities In the prediction of college grade-point aver- 
ages froiB a strsdaptlve test were higher for ability estlnates not based on 
IRT methods than they were for W-based ability estlaates (Research Report 
80-3). Wthln the IRT oethods for estlaatlng ability, BayesUn methods are 
slightly order dependent, resulting In slightly different ability estimates 
with the same Items administered In different orders (Syi^son, In Research 
Report 77-1). Bayeslan ability estimates also have different psychometric 
characteristics than do estimates based on maxlsaim' likelihood procedures. 

7. Adaptive tests can be used for classlflcatlim purposes as «fell as for mea- 
surement on a continuous scale. When compare! to conventlOTal tests de- 
signed to make classifications, adaptive tests can classify more accurately 
than conventional tests when It Is necessary to sake wore than a single 
dlchotomous classification based on test scores (Vale, In Research Report 
77-1). 

Test Administration Conditions 

8. An analysis of response latency data shoved that testees approach different 
testing procedures In different ways (Research Report 76-2). The response 
latency data suggest that these different test-taking styles and strategies 
might be potentially useful as moderator or predictor variables in the pre- 
diction of external criteria. 

9. Computer-administered feedback (Ismedlate knowledge of results) on a con- 
ventional test appears to result In enhanced ability test performance for 
testeees of all ability levels (Research Report 76-3). tt»der computer-ad- 
ministered feedback conditions, mean test scores were significantly higher 
for both high- and low-ablllty testees. Ninety percent of college students 
favorably evaluated their experience with computer-administered feedback 
(Research Report 76-4). 

10. Adaptive tests appear to be more Intrinsically motivating for low-ability 
testees (Research Report 76-4), and result in higher ability estlnates (Re- 
search Report 76-3), than sladlarly wlmlnistered conventional tests. This 
suggests that adaptive testing might eliminate some of the undesirable psy- 
chological effects c^racteristic of conventional testing procedures, re- 
sulting in fairer ami more *iiccurate test scores for testees who typically 
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obtain low scores on conventional ability teats. 

U. Iteo-dlfflculty perceptions of college students «ere highly to ob- 

lectlve Indices of test Itwi difficulty (Research Report 77-3). This sug- 
gests that test difficulty, which say differ between conventional and adap- 
tive tests for exaainees of the saiae ability, night be an important factor 
affecting the test perfomance of individuals. 

12. Test difficulty interacted with iamcdiate knowledge of results to produce 
effects on ability estiiiates, but not on psychological reactions to the 
testing conditions (Research Report 78-2). Since difficulty is morn equal 
across ability levels in an adaptive test than in a conventional test, 
these results suggest that the testing environaent of adaptive tests will 
result in fewer sources of error in ability estimates than will convention- 
al ability tests. 

Other Findings 

13 Analysis of person-fit data derived frcwi the person response curve indicat- 
ed that the vast majority of college students studied responded to a set of 
test items in accordance with the 3-paraBeter logistic IRT ^««««;[f 
Report 79-7). The per«>n response curve approach also identified a small 
group of individuals lAose responses to the test items appeared to result 
from an underlying multidimensional ability structure with respect to the 
ability domain studied. 

U. The dependence of adaptive testing on the multiple-choice item will result 
in test scores with less than optimal properties. Analysis of free-re- 
sponse item data indicates that more informative ability estimates can be 
derived from free response items than from the same items administered as 
multiple-choice items and scored by optimal IRT methods; dJ fferences were 
greater for high-ability examinees (Research Report 77-2). 

15. Interactive computer administration of ability test items permits tte de- 
sign and implementation of ability tests using novel item formats, which 
may extend the range of iwasurable abilities beyong those now measurable 
using a dimensional approach. The design and implementation of an interac- 
tive spatial problem-solving test (Research Report 80-2) permitted the mea- 
surement and analysis of a number of problem-solving types of variables 
that described individual differences in probl«8-solving 8tyJ«sj ^ 
variables might be useful as ability kinds of variables, following further 
scudy 'and refinement. 

Implications for Further Research 

The findings and experience of this research progran support the feasibili- 
ty, utility and psychometric advantages of computerized adaptive 
Intellectual capabilities. However, many new questions were raised by the re 
search, and some of the original questions addressed are still in need of fur- 
ther research. 
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Research has concentrated on comparison of the stradaptlve and Bayeslan 
adaptive testing strategies with conventional tests. Further research is needed 
(I) comparing these strategies directly with each other, in both live testing 
and in siaiulation. and (2) in conparing these straiegles with other adaptive 
testing strategies, such as an inforaat ion-based itea selection routine. 

All adaptive testing strategy comparisons to date that used raonte carlo 
simulation techniques have oade two assumptions that are not characteristic of 
real data. First, they have assumed that the item pool is characterized by 
items with known parameter values. In real item pools, however, item paraideter 
values are never known, but are always estimated. These estimates are only ap- 
proximations to the true values and, as a consequence, contain soii» degree of 
error, with rather substantial degrees of error for some of the item parameters. 
Since adaptive testing strategies are designed to explicitly select items based 
on these item parameter estimates, the possibility exists that in a real item 
pool with error-laden item paraiMters adaptive tests might perform less optimal- 
ly due to the error in the item parameter estimates. Thus, simulation studies 
should be designed and implemented to experimentally vary the degrees of error 
in item parameter estimates and to evaluate the effects of these errors on the 
performance of adaptive testing strategies, in order to identify the effects of 
these errors on the performance of the testing strategies. 

A second assumption made in all monte carlo comparisons of adaptive testing 
strategies is that the item pool is strictly unidimensional. since only one set 
of item parameter values is used for each item. In real data, however, item 
pools are very rarely strictly unidimensional. Frequently, item pools are char- 
acterised by second and succeeding factors that account for from trivial por- 
tions of the item pool variance to substantial portions of that variance. While 
multidimensional IRT models have no: yet been sufficiently operationalissed to 
permit the estimation of item pararaisters for dimensions beyond the first, it is 
possible to examine the effects of multidinensionality on adaptive testing 
strategies. One approach to studying this problem Is to simulate the adminis- 
tration of adaptive testing strategies with unidimensional item parameters when 
item responses are generated from an underlying multidimensional structure. 
This approach assumes that the dimensionality of the Item responses is the true 
underlying multidimensional structure, while the apparent unidioensionallty of 
the item pool is the result of the item paraaaterisation process applied to it. 
Studies of this type would enable the identification of the degrees and types of 
multldimensionality that could be tolerated by the various adaptive testing 
strategieB without serious degradation of their performance. 

Further live-testing comparisons of adaptive testing strategies are also 
necessary. The four live-testing studies completed under this contract yielded 
somewhat conflicting results. In two of the four studies, adaptive tests ob- 
tained higher validities than conventional tests with a smaller average number 
of items, and in one study with a saaller median number of iteras. In the study 
using military recruits a very clear advantage was obvious for the ptive 
tests beginning at short test lengths. When a large group of college students 
was studied, however, although the expected differences in reliability were ob- 
tained, the conventional test performed better on the concurrent validity crlte- 
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rion. Since the design of the two large-sample studies was similar, differences 
In results could be attributable to differences in the examinees, the item 
pools, or the criterion tests. Additional live-testing stiulles are needed to 
evaluate the effects of these conditions, as well as to evaluate the perforaance 
of other adaptive testing strategies and to evaluate their perfonaance with ad- 
ditional criterion variables. 

Test Administration Conditions 

The research results show that a number of test administration variables 
influence test scores, IRT-based ability estimates, and/or examinees* reactions 
to tests. These include test speededness, test difficulty, and immediate feed- 
back to examinees as to whether their item responses are correct or incorrect. 
Testing strategy (miaptlve versus conventional) also had some effects on test 
performance and reactions, probably due to the differing difficulties of adap- 
tive and conventional tests. Iim»dlate feedback of results appeared to be an 
important potential factor In increasing test-taking motivation and improving 
test scores. 

Studies completed on the effects of test administration conditions have all 
utilized volunteer college students as examinees and have used verbal ability 
items in the tests administered. Since the test-taking motivation of volunteer 
students might differ when tested under conditions where the tests are being 
used for grading or other purposes, future stwiies should examine the effects of 
test aiminlstration conditions when the tests being administered are to be used 
for purposes other than research.^ In addition, the generality of the observed 
effects should be studied on populations other than college students, and using 
other tests in addition to verbal ability tests. Further studies should also 
Include the effects of other adaptive testing strategies as test administration 
conditions, in conjunction with limsedlate knowledge of results. 

Intra-Indlvidual Dimensionality, Response Modes, and New Abilities 

Research in these three areas was only begun during the contract period. 
The person characteristic curve results show that the vast majority of the one 
group of college students studied responded to a set of test items in accordance 
with the three-parameter logistic IRT model. A small group of students was 
identified, however, lAose responses appeared to be reliably divergent from that 
model.** These deviations were ascribed to intra-indlvldual multidimensionality. 
Since the person response curve method was used in only this one study, further 
studies are Indicated. Of importance is the performance in monte carlo siaula- 
tions of the person-fit Indices under conditions of unidiraeTislonallty , the de- 
rivation of appropriate sampling distributions of the person-fit indices, the 
evaluation of alternate person-fit indices, and the effect of test structure 
characteristics (e.g., distributions of Item characteristics) on t. e perfomance 
of person-fit indices. Additional live-testing studies should als« be Imple- 
mented to study the effects of various test administration conditions (e.g., 
interruptions, poor testing conditions, immediate knowledge of results) on in- 
tralndivldual dlnensionality by means of the person response curve and assoc- 
iated indices of person fit. 

Failure to extract cufficient information *f rem an examinee's responses to 
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nultlple-choice test Items can lower the quality of obtained measur^nts. The 
one study completed on this probleo Indicated that the use of free-response 
Items was able to improve the measurement precision of a set of vocabulary items 
beyond that possible fran scoring the same items as polychotomous multiple- 
choice items. Both of these administration/scoring modes provide better mea- 
surement than dlchotomously-flcored nail tlplc-cho ice items. Since this study used 
college students on a single short vocabulary test, further studies are obvious- 
ly needed to examine the generality of the results. In addition, research is 
needed to examine the performance of other alternatives to the dlctotomously- 
scored multiple-choice Item such as probabilistic responding, which are now fea- 
sible when administered by interactive c(»q)uters. 

Interactive computer administration of ability tests makes possible the 
development of a wide range of new kinds of ability tests to supplement the 
standard dimensional ty-based tests currently In use. This project has demon- 
strated th'it Interactive administration of a problem-solving type of test can 
result in substantial amounts of new kinds of data on examinees in addition to 
the tradltl-inal number of items answered correctly. These data can Include In- 
fonaation on problem-solving styles and response latencies that might be indic- 
ative of other individual differences problem-solving variables. Future re- 
search should investigate the psychometric characteristics of these variables, 
including their reliabilities and their contributions to validity, as well as 
examine the utility of the interactive computer for measuring other abilities 
such as spatial, perceptual, and memory abilities which are now possible to be 
measured by computer administration. 
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RESEARCH REPORT ABSTRACTS 

Research Report 75-6 
A SlBulatlon Study of Stradaptive Ability Testing 
C. Devid Vale and David J. Weiss 
December 1975 

A conventional test and two forms of a stradaptive test were administered to 
•^thousands of siiaulated subjects by mlnicoBputer. Characteristics of the three 
tests using several scoring techniques were investigated while varying the dis- 
crimirating power of the items, the lengths of the teats, and the availability 
of prior Information about the testee's ability level. The tests were evaluated 
in terws of their correlations with underlying ability, the amount of informa- 
tion they provided about ability, and the equiprecision of measurenient they ex- 
hibited. Major findings were (1) scores on the conventional test correlated 
progressively less with ability. as item discriminating po*rer was increased 
beyond 8 - l.O; (2) the conventional test provided increasingly poorer equiprec- 
iBi'ui >f measurement as items became more discriminating; (3) these undesirable 
ch<.- ; - eristics were not characteristic of scores on the stradaptive teat; (A) 
the stradaptive test provided higher score-ability correlations than the conven- 
tional test when item discriminations were high; (5) the stradaptive test pro- 
vided more information and better equiprecision of measurement than the conven- 
tional test when test lengths araJ it«B discriminations were the same for the two 
strategies; (6) the use of valid prior ability estimates by stradaptive strate- 
gies resulted in scores which had better measur^nt characteristics than scores 
derived from a fixed entry point; (7) a Bayesian scoring technique implemented 
within the stradaptive testing. strategy provided scores with good measurement 
characteristics; and (8) further research is necessary to develop improved flex- 
ible termination criteria for the stradaptive test. (AD A02O961) 

Research Report 76-1 
Sone Properties of a Bayesian Adaptive Ability Testing Strategy 
James R. McBride and Itevid J. Weiss 
March 1976 

Four monte carlo simulation studies of Owen's Bayesian sequential procedure for 
adaptive mental testing were conductled. Whereas previous simulation studies of 
this procedure have concentrated on evaluating it in terms of tlw correlation of 
Us test scores with simulated ability in a normal population, these four stud- 
ies explored a number of additional properties, both in a normally, distributed 
population and in a distribution-free context. Study 1 replicated previous 
studies with finite item pools, but examined such properties as the bias of es- 
timate, mean absolute error, and correlation of test length with ability. Stud- 
ies 2 and 3 examined the same variables in a number of hypothetical infinite 
item pools, investigating the effects of item discriminating power, guessing, 
and variable vs. fixed test length. Study 4 investigated some properties of the 
Bayesian test scores as latent trait estimators, under three different item pool 
configurations (regressions of item discrimination on Item difficulty). The 
properties of interest included the regression of latent trait estimates on ac- 
tual trait levels, the conditional bias, of such estimates, the information curve 
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of the trait estimates, and the relationship of test length to ability level. 
The results of these studies indicated that the ability estinates derived from 
the Bayesian test strategy were highly correlated with ability level. However, 
the ability estiaates were also highly correlated with nuaber of iteB» adodnis- 
tered, were non-linearly biased, and provided measurements which were not of 
equal precision at all levels of ability. (AD A022964) 



Research Rejwrt 76-2 
Effects of Tine Limits on Test-Taking Behavior 
T. W. Miller and David J. Weiss 
April 1976 

Three related experimental studies analysed rate and accuracy of test response 
under ime-linit and no-time-llmit conditions. Test Instructions and multiple- 
choice vocabulary items were administered by computer. Stwlent volunteers re- 
ceived monetary rewards under both testing conditions. In the first study, col- 
lege students were blocked into high- and low-ability groups on the basis of 
pretest scores. Results for both ability groups showed higher response rates 
under time-limit conditions than under no-time-limit conditions. There were no 
significant differences between the time-limit and no-time-limlt accuracy 
scores. Similar results were obtained in a second study in which each student 
received both time-limit and no-time-limit conditions. In h third study each 
testee received the same testing condition twice, and higher response rates were 
observed under the time-limit condition; response accuracy remained consistent 
across testing conditions. All three studies showed essentially eero correla- 
tions between response rate and response accuracy. Response latency data were 
also analysed in the three studies. These data suggested the existence of dif- 
ferent test-taking styles and strategies under time-limit -and no-time-limlt 
testing conditions. The results of these studies suggest that number-correct 
scores from time-limit tests are a complex function of response rate, response 
accuiacy, test-taking style and test-taking strategy, and therefore are not 
likely to be as valid or as useful as number-correct scores from no-tiae-limit- 
tests. (AD A024422) 



Research Report 76-3 
Effects of Immediate Knowledge of Results 
and Adaptive Testing on Ability Test Performance 
Nancy E. Bets and David J. Weiss 
June 1976 

•mis study investigated the effects of imaed ate knowledge of results (KR) con- 
cerning the correctness or incorrectness of each item response on a computer-ad- 
ministered test of verbal ability. The effects of KR were examined on a 50-item 
conventional test and a stradaptive ability test and in high- and low-ability 
groups. The primary dependent variable was maximum likelihood ability estimates 
derived from the item responses. Results indicated that mean test scores for 
the High-Ability group receiving KR were higher than for the No-KR group on both 
the conventional and stradaptive tests. For Low-Ability examinees, mean scores 
were higher under KR conditions than under No-KR conditions on both tests, but 
the difference was statistically significant only for the conventional test. 
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However, the higher nean scores of the Low-Ability testees on the stradaptlve 
test Indlcatexl that for lo%r-ability examinees, adaptive testing had the same 
effects on test performance as did the provision of innaedlate KR. Knowledge of 
results did not have significant effects on either response latencies, response 
consistency on the stradaptlve test, or the internal consistency reliability of 
the conventional test. No significant score differences were found on a 44-item 
post-test administered without KR, Indicating that the facilitative effects of 
knowledge of results on test performance were confined to the test in which KR 
was provided. The results of the study were interpreted as indicating the po- 
tential of both immediate knowledge of results and adaptive testing procedures 
to increase the extent to which ability tests measure '•maximum perfomance" lev- 
els. (AD A027147) 



Research Report 76-4 
Psychological Effects of Immediate Knowledge of 
Results and Adaptive Ability Testing 
Nancy E. Bets and David J. Weiss 
June 1976 

This study investigated the effects of providing Immediate knowledge of results 
(KR) and adaptive testing on test anxiety and test-taking motivation. Also 
studied was the accuracy of student perceptions of the difficulty of adaptive 
and conventional tests administered with or without inrnwdlate knowledge of re- 
sults. Testees were 350 college students divided into high- and low-ability 
groups and randomly assigned to one of four test strategies by KR conditions. 
The ability level of examinees was found to be related to their reported levels 
of motivation and to differences in reported motivation under the different 
testing conditions. Low-ability examinees reported significantly higher levels 
of motivation on the stradaptlve test than on the conventional test, vhlle the 
reported motivation of high-ability examinees did not differ as a function of 
ability level. Low^ability testees reported lower motivation with KR than with- 
out KR, while higher ability testees reported higher motivation with KR. Analy- 
sis of the anxiety data indicated that students reported significantly higher 
levels of anxiety on the stradaptlve test than on the conventional test. The 
provision of KR did not result in significant differences in reported .anxiety. 
However, highest levels of anxiety were reported by the low-ability group on tha 
stradaptlve test administered with KR. These results, in conjunction with pre- 
viously reported data on effects of KR on ability test performance, were inter- 
preted as being the result of facilitative anxiety. Students were able to per- 
ceive the relative difficulty of test items with some accuracy. However, per- 
ceptions of the relative degree of test difficulty were much more closely relat- 
ed to actual test score on the conventional test than on the stradaptlve test. 
Over 90% of the students rtacted favorably to the provision of Immediate KR. 
These results suggest that adaptive testing creates a psychological environment 
for testing which is n»re equivalently motivating for examinees of all ability 
levels and results in a greater standardization of the test-taking environment, 
than does conventional testing. (AD A027170) 
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Research Report 77-1 
Applications of Computerized Adaptive Testing 
Jaaes R. McBride, Jaaes B. Sympson, 
C. David Vale, Steven M. Pine, and Isaac I. Bejar 
riited by David J. Weiss 
Hatch 1977 



This symposium consisted of five papers: 

1. Janes R. JteBride: A Brief Overview of Adaptive Testing 

Adaptive testing is defined, and soae of Its item selection and scoring 
strategies briefly discussed. Item response theory, or item characteristic 
curve theory, **!lch is useful for the implementation of adaptive testing Is 
briefly described. The concept of "information" in a test is introduced 
and discussed in the context of both adaotive and conventional tests. The 
advantages of adaptive testing, in terms of the nature of Infomatlon it 
provides, are described. 

2. James B. Sympson: Estimation of Latent Trait Status in Adaptive Testing 
Procedures 

The role of latent trait theory in measurraent for criterion prediction and 
in criterion-referenced measurraent is explicated. It is noted that latent 
trait models allow both normed-referenced and criterion-referenced inter- 
pretations of test performsnce. Ifaing a 3-paramBter logistic test model, 
an example of sequential estimation in a 20-item adaptive test is present- 
ed. After each item is ^ministered, four different ability estimates (two 
likelihood-based and two Bayesian estimates) are calculated. Characteris- 
tics of the four estimation methods are discussed. The Information avail- 
able in the items selected by the adaptive test is compared with the infor- 
mation available from application of latent trait theory, and adaptive 
testing is advocated as a useful approach to human assessment. 

3. C. David Vale: Adaptive Testing and the Problem of Classification 

The use of adaptive testing procedures to make ability classification deci- 
sions (i.e., cutting score decisions) is discussed. D&ta from computer 
simulations comparing conventional testing strategies with an adaptive 
testing strategy are presented. These data suggest that, although a con- 
ventional test is as good as an adaptive test when there is one cutting 
score at the middle of the distribution of ability, an adaptive test can 
provide better classification decisions when there Is tsore than one cutting 
score* Some utility considerations are also discussed. 



4. Steven M. Pine: Applications of Iton Characteristic Curve Theory to the 
Problem of Test Bias 

It is argued that a major problem In current efforts to develop less biased 
tests is an over-reliance on classical test theory. Item characteristic 
curve (ICC) theory, which is based on individual rather than group-oriented 
neasurement, is offered as a more appropriate measurement model. A defini- 
tion of test bias based on ICC theory is presented. Using this definition, 
several empirical tests for bias are presented and demonstrated with real 
test data. Additional applications of ICC theory to the problem of test 
bias are also discussed. 
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5. Isaac I. Bejar: Applications of Adaptive Testing in Measuring Achievenient 
and Performance 

The paper reviews two relatively recent developments in psychometric 
theory — the assessment of partial knowledge and research in adaptive test- 
ing. It is argued that the use of non-dichotomous item formats, needed for 
the assessment of partial knowledge, and now made possible by the adminis- 
tration of achievement test items on interactive computers, should result 
in achievement test scores which are a more realistic and precise indica- 
tion of what a student can do. 
(AD A038114) 



Research Report 77-2 
A Comparison of Information Functions of Multiple-Choice 
and Free-Response Vocabulary Items 
C. David Vale and David J. Weiss 
April 1977 

Twenty multiple-choice vocabulary items and 20 free-response vocabulary items 
were administered to 660 college students. The free-response items consisted of 
the stem words of the multiple-choice items. Testees were asked to respond to 
the free-response items with synonyms. A computer algorithm was developed to 
transform the numerous free-responses entered by the testees into a manageable 
number of categories. The multiple-choice and the free-response items were then 
calibrated according to Bock's polychotoraous logistic model. One item was dis- 
carded because of extremely poor fit with the model, and test information func- 
tions were determined from the other 19 items. Higher levels of information 
were obtained from the free-response items over most of the range of abilities 
between 9 - -3.0 to 0 » +3.0. 



Research Report 77-3 
Accuracy of Perceived Test-Item Difficulties 
J. Stephen Prestwood and David J. Weiss 
May 1977 

This study investigated the accuracy with which testees perceive the difficulty 
of ability-test items. Two 41-item conventional tests of verbal ability were 
constructed for administration to testees in two ability groups. Testees in 
both the high- and low-ability groups responded to each multiple-choice item by 
choosing the correct alternative and then rating the item's difficulty relative 
to their levels of ability. Least-squares estimates of item difficulty, which 
were based on the difficulty ratings, correlated highly with proportion-correct 
and latent trait estimates of item difficulty based on a normlng sample. Least- 
squares estimates of testee ability, which were based solely on the difficulty 
perceptions of the testees, correlated significantly with number-correct and 
maximum-likelihood ability scores based on the testees' conventional responses 
to the items. These results show that item-difficulty perceptions were highly 
rfclated to the "objective" indices of item difficulty often used in test con- 
struction and that as testee ability level increased, the items were perceived 
as being relatively less difficult. The relationship between a testee 's ability 



<- 14 - 



and his/her perception of an individual item's relative difficulty appeared to 
he weak. Of sajor Importance was the finding that iteos which were appropriate 
in difficulty levels from a psychorotric standpoint were perceived by the tes- 
tees as being too difficult for their ability levels. The effecrs on testeea of 
tailoring a test such that items are perceived as being uniformly too difficult 
should be investigated. (AD A041084} 



Research Report 77-4 
A Rapid Item-Search Procedure for Bayesian Adaptive Testing 
C. David Vale and David J. Weiss 
May 1977 

An alternative i tor-select ion procedure for use withOiran's Bayesian adaptive 
testing strategy is proposc^d. Thia procedure is, by design, faster than (Xien*8 
original procedure tecause it searches only part (as ccmpared with all) of the 
total item pool. Itoi selections are, however, identical for both a^thods. 
After a conceptual develO|naent of the rapid-search procedure, the supporting 
mathematica are presented. In a sioHilated comparison with three item pools, the 
rapid-search proccMlure required as little as one-tenth the computer time as 
Owen's technique. (AD AO41O90) 



Research Report 76-2 
The Effects of Knowledge of Results and Test Difficulty 
on Ability Test Performance and Psychological Reactions to Testing 
J. Stephen Prestwood and David J. Ifeiss 
Sept«aber 1978 

Students were administered one of three conventional or one of three atradaptive 
vocabulary tests with or without knowledge of results (KR). The three tests of 
each type differed in difficulty, as assessed by the expected proportion of cor- 
rect responses to the test items. Results Indicated that the mean saximiB-llke- 
llhood estimates of individuals' abilities varied aa a Joint ftmction of ]CR-pro~ 
vision ai^ test difficulty. Students receiving KR scored highest on the most- 
difficult test and lowest on the least-difficult test; students receiving no KR 
scored highest on the least-difficult test and did most poorly on the most- 
difficult test. Although the students perceived the differences in test diffi- 
culty, there were no effects on mean student anxiety or motivation scorea at- 
tributable to difficulty alone. Regardless of test difficulty, students reacted 
very favorably to receiving KR, and its provision increased the mean level of 
reporte»l motivation. 



Research Report 79-7 
The Peraon ResjKinse Curve; Fit of Individuals 
to Item Characteristic Curve Models 
Tom E. Trabin and David J. Vtelss 
December 1979 

This study investigated a method of determining the fit of individuals to item 
characteristic curve (ICC) models using the jMsrson response curve (PRC). The 
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construction of observed PRCs is based on an individual's proportion correct on 
test ites subsets (strata) that differ systematically in difficulty level. A 
method is proposed for identifying irregularities in an observed PRC by coapar- 
ing it with the expected PRC predicted by the three-paraseter logistic ICC model 
for that individual's ability level. Diagnostic potential of the PRC is dis- 
cussed in teres of the degree and type of deviations of tte oteerved PRC from 
the expected VRC predict«l by the laodel. ' 

Observed PRCs were constructed for 151 college stwlents using vocabulary test 
data on 216 items of wide difficulty range. Data on students' test-taking acti- 
vation, test-taking anxiety, and perceived test difficulty were also obtained. 
PRCs for the students were found to be reliable and to have shapes that were 
prinarily a function of ability level. Three-iiaraaeter logistic oodel expected 
PRCs served as good predictors of observed PRCs for over 90Z of the group. As 
anticipated froB this general overall fit of the observed data to the ICC aodel, 
there were no significant correlations between degree of non-fit and test-taking 
Motivation, te8t*-takli^ anxiety, or perceived test difficulty. Using split-pool 
observed PRCs, a few students were identified who deviated significantly froa 
the expected PRC. 

The results of this study suggested that three-paraaeter logistic expected PRCs 
for given ability levels were good predictors of test response profiles for the 
students in this saaple. Significant non-fit between observed and expected PRCs 
would si^gest the interaction of ^ditional diaensions in the testing situation 
for a given Individual. RecOTsendations are aade for further research on person 
response curves. 



This report describes a pilot study on the developaent and adainistration of a 
Lest Using a spatial reasoning problem, the IS-puzzle. The test utilized the 
on-line capabilities of a real-tioe ccmputer (1) to record an examinee's prog- 
ress on each probloa through a sequence of problem-solving "moves" and (2) to 
collect additional on-line data that might be of relevance to the evaluation of 
examinee performance (e.g. , ntnber of Illegal a.id repeated moves, response la- 
tency trends). The examinees, 61 students in an introductory psychology class, 
were required to type a seqience of moves that w>uld brii% one 4x4 array of 
scrambled numbers (start configuration) into agreeront %rith a second 4x4 array 
(goal configuration), using as few moves as possible. Ds*-a analyses emphasized 
the ccRBparison of several mthods of indexing problem ditiiculty, methods of 
scoring individual performance, and the relationship between response latency 
data, performance, and problem-solving strategy. 

Subjective ratings of the perceived difficulty of replications of the 15-puzzle 
were obtained froa a separate student sample to investigate (1) the subjective 
dimeusions used by students in evaluating the difficulty of this problem type, 
(2) how accurately the actual performance diff:'culty of these problems could be 
evaluated by students, and (3) whether there wtre reliable individual differ- 



Research Report 80-2 
Interactive Computer Administration of a Spatial Reasoning Test 
Austin T. Church and David J. Ueiss 
April 1980 
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in diffictaty perceptions related to actual performance differences. 



Keoults of the study suggested that four performance indices might be useful in 
Indexing problem difficulty: <1) «ean ntmber of moves in the sample, (2) pro- 
portion of students solving the probl«a. (3) proportion of students solving the 
problem in the optimal number of moves, and (4) a Special Difficulty Index, de- 
fined as the sample mean number of moves divided by the minimum number of moves 
required. Four alternative methods of scoring total test performance and tw> 
mettods of scoring individual problem performance were studied. The scores that 
twM into account differential nwifliers of oovcs between the optimal and maxlBma 
number allowed were related soswwhat more to performance ratings obtained from 
Independent Judges. 

BxmlnaCion of problem performance indices, the Special Difficulty Index, and 
atudents* perceptions of the difficulty of the test probloM indicated that most 
of the problm were too easy for most students. However, the possibility of 
obtaining a more discriminating sul^t of probloss was suggested by item-total 
"'score correlations obtaliwd for each problra. The data suggested that better 
consistency ml^t be obtained using problems of similar difficulty levels, and 
it was hypothesized that an adaptive test tailoring problems to the ability lev- 
el of each student would increase the reliability of measurement. 

Ifean initial and total "move" latencies for each problem were strongly related 
to soBe of the performsnce Indices of problem difficulty. At the level of indi- 
vidual performance, only total latency or problem solution time was related to 
problem^erformancc. Utency data appeared to confound differences in the abil- 
ity to visualise a sequence of moves and differences In students* work styles. 
Strong evidence for these work styles was found in student consistency of ini- 
tial, average, and total response latency TOasures across all problaas. 

perceived difficulty ratings showed reliable individual differences in the level 
and variability of difficulty perceptions. The data suggested that the individ- 
ual differences found were related to individual differences in ability to visu- 
alise and to maintain a sequence of moves in short-term memory. It was conclud- 
ed that an adequate selection of problem replications should be able to tap 
these differences, resulting in reliable solution performance differences. 

Improvements in problem selection and design were suggested by the deta in this 
study. Future tests of this type should consist of fewer but more difficult 
problems, partlcuUrly problems not permitting reactive, inpulslve rolutions. 
This type of test wuld seem especially appropriate for adaptive adsdnistra- 
tlon: U) scores on problem tsllored to the individual's ability would likely 
be more highly related to each otter, resulting in more highly reliable total 
scores; (2) the motivational aspects of the tests, which seem more taxing and 
potentially frustrating than conventional item formats, would likely be Ita- 
proved, and (3) for most testees equally precise measurements could be obtained 
in shorter periods of tlrc than with conventional test administration. 
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Research Report 80-3 
Criterlon-4lelated Validity of Adaptive Te sting Strategies 
Janet G« 'DiOBpson and David J. Weiss 
' Jtme 1980 

Criterion-related validity of two adaptive tests was c(»pared with a convention- 
al test in tuo groups of college students. Students in tooup 1 (N - 101) were 
adsilnlstered a stradaptlve test and a peaked conventional test; students in 
Croup 2 (N - 131) were adalnistered a Bayeslan adaptive test and the ssaie peaked 
conventional test. All tests were coBputer-adainlstered oultiple-ctoice vocabu- 
lary tests; iteaa were selected from the saae pool, but there i«s no overlap of 
-^eas between the adaptive and conventional tests within each group* The strad- 
aptlve test itSB responses were scored using four different aetlnds (two aean 
difficulty scores, a Bayeslan score, and waxlimm likelihood) with two different 
sets of iten Proaeter estlmtes, to study the effects on criterion-related va- 
lidity of scoring nethods and/or itoa paraneter estiaates. Criterion variables 
were high school and college grade-point averages (CPA), and scores on the Aaer- 
Ican College Testing Prograa (ACT) achievement tests. 

Results indicated generally hl^r validities for the adaptive tests; at least 
one aetluMl of scoring the stradaptlve tests resulted in higher correlations than 
the conventional test with seven of the ei^t criterion variables (and equal 
correlations for the eighth), even tt»ugh the stradaptlve test adainistered over 
25X fewer it on, on the average, than did the conventional test. The stradsp- 
tlve test obtained a significantly higher correlation with overall college CPA 
(r - .27) than did the conventional test; when aath CPA was partialled froa 
overall CPA, the aaxiaua correlation for the stradaptlve test with an average 
length of 29.2 iteas was r • .51, while the 40-iteB conventional test correlated 
only .36. The data showed generally higher criterion-related validities for the 
aean difficulty scores on the stradaptlve test in coaparison to the Bayeslan and 
aaxlBua likelihood scores; the different itea peraroter estiaates had no effect 
on validity, resulting in stores that correlated .98 with each other. 

Although the aean length of the Bayeslan adaptive test was 48.7 itm, tte nedi- 
an nuaber of Iteas (35) was less than that of the 40-ite« conventional test. 
Ability estiaates froa this adaptive test also correlated higher with seven of 
the eight criterion variables than did scores on the conventional tests, al- 
ttwugh none of ttw differences were statisticauLly significant. 

These data indicate that adaptive tests can achieve criterion-related validities 
equal to, and in soae cases significantly greater than, those obtained by con- 
ventional tests While adainlstering up to 27X fewer Itema, on the average. The 
data also suggest that latent-trait-based scoring of stradaptlve tests aay not 
be optimal with respect to criterion-related validity. Llaitations of the study 
are discussed and suggestions are aade for additional research. (AD A087595) 
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Research Report 80-5 
An Altemate«Foms Reliability and Concur rent Validity 
Coaparlson of Bayealan Adaptive and Conventional A bility Teste 
■ G. Gage Kingsbury and David J. Ueiss 
DBCCBber 1980 

TWO 30-lteo alternate fonis of a ojnventional test and a Bayealan adaptive test 
WB-e adalnlstered by computer to 472 undergraduate psychology students. In ad- 
dition, each studait completed a 120-item paper-and-pencil test, which served as 
a concurrent validity criterion test, and a series of very easy questions de- 
signed to detect students who were not answering conscientiously. All test 
Itess %tete five-alternative TOltiple-cholce vocabulary items, aeliabllity and 
concurrent validity of the two testing strategies were evaluated after the ad- 
minlatratJon of each item for each of the tests, so that tr«ids lirflcating dlf- 
f irences in the testl:« strategies as a function of test length could be detect- 
ed. For each test, additional analyses were coialucted to determine idiether the 
two forms of the test were operationally alternate forms. 

Results of the analysis of alternate-forms correspondence indicated that for all 
test lengths greater than 10 lt«, each of the alternate forms for the two test 
types resulted in fairly constant mean ability level estimates. When the scor- 
InTprocedure was equated, the mean ability levels estimated from the two forms 
of the conventional test differed to a greater extent than those estimated from 
the two forms of the Bayealan adaptive test. 

The alternate-forms reliability analysis indicated that the two forau. of the 
Bayealan test resulted in more reliable scores than the two forM of the conven- 
tional test for all test lengths greater than two lt«as. This result was ob- 
served when the conventional test was scored either by the Bayealan or propor- 
tion-correct TOtbod. 

The concurrent validity analysis showed th5^ the conventional test produced 
ability level estlmtes that correlated more highly with the criterion test 
scores than did the Baj-eslan test for all lengths greater than four Itema. This 
result was observed for both scoring procedures used witb the conventional test. 

Umitatlons of the sttMy, ai^ the conclusions that may be drawn from it, are 
discussed. These limitations, which may have affected the results of this 
study. Included possible differences in the alternate foras used within the two 
testing strategies, the relatively small calibration samples used to estimate 
the ICC parameters for the items used in the study, and method variance in the 
conventional tests. (AD A094A7.7) 



Research Report 81-2 
Ef fects of Immediate Feedback and Pac ing of Item Presentation 
on""Abllity Test j^erformance and Fsychological Reactions to Testing 

Marilyn F. Johnson, David J. tolas, and J. Stephen Freatwood 

February 1981 

The study investigated the Joint effects of knowledge of results (KR or no-KR), 
pacing of item preaentatlon (computer or self-pacing), and type of testing 
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strategy (50-ltem peaked conventional, variable-length stradaptlve, or SO-lten 
fixed-length stradaptlve teat) on ability test perfonaance, teit Itea responae 
latency, InfonMtlon, and psychological reactions to testing. The psychological 
reactions to testing were obtained froB Ukert-type Iteis that assessed test- 
taking anxiety, notlvatlon, perception of difficulty, and reactions to knowledge 
of results. Data were obtained froa 447 college students randomly assigned to 

one of the 12 experlaental conditions. | 

<> . 

The results indicated thait there were no effects on ability estiaates due' to 
knowledge of results, testing strategy, or pacing of item, presentation. Al- 
though average latencies were greater on the stradaptlve tests than on the con- 
ventional test, the overall testing tlas m not substantially longer on the 
adaptive tests and Bay have been a function of differences In test difficulty. 
Analysis of Inforaation values Indicated higher levels of Infonaation on the 
stradaptlve tests than on the conventional test. There was no statistically 
significant main effect for any of the three experinental conditions when test 
anxiety or teat-taking aotlvatlon were the dependent variables, although there 
were some significant interaction effects. 

These results indicate that testing conditions «ay Interact in a complex way to 
determine psychological reactions to the testing environment. The interactions 
do suggest, however, a somewhat consistent standardising effect of KR on test 
anxiety and test-taking motlvatllon. This standardizing effect of KR showed that 
approximately equal levels of motivation and anxiety were reported under the 
various testing conditions when KR was provided, but that mean levels of these 
variables were substantially different when KR was not provided. Consistent 
with theoretical expectations, the conventional test was perceived as being 
either too easy or too difficult, whereas the adaptive tests were perceived more 
often as being of appropriate difficulty. 

The results concerning the effects of KR on test performance, motivation, and 
anxiety found in this study were contrary to earlier reported findings; and dif- 
ferences in the studies are delineated. Recowaendations are made concerning the 
control of specific testing conditions, such as difficulty of the test and abil- 
ity level of the examinee population, as wll as suggestions for the further 
analysis of the standardieing effect of KR. 

Research Report 83-1 
Reliability and Validity of Adaptive and Conventional Tests 
In a Military Recruit Population 
John T. Martin, JamB R. McBride, and David J. Weiss 

January 1983 

A conventional verbal ability test and a Bayesian adaptive verbal ability test 
were compared using a variety of psychometric criteria. Tests were adainistered 
to 550 Marine recruits, half of whom received two 30-ltem alternate forms of a 
conventional test and half of whom received two 30-1 tern alternate forms of a 
Bayesian adaptive test. Both types of tests were computer adainistered and were 
followed by a 50-ltem conventional verbal ability criterion test. 

The alternate forms of the adaptive test resulted in scores that were much more 
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sUillar in oeane variances than were the conventional tests for which nost 
aeena and variances fer/varioue test lengths ware significantly different. 
Adaptive testii^ resulted in significantly higher alternate fot«8 reliahility 
correlations for all teat lengths through 19 iteasj reliability of a 9-item 
adaptive test m equal to that of a 17-itea conventional test. Validity corre- 
lations Here higher for the adaptive pr^feedure for all teat lengths. Validity 
of an U-item adaptive test was equal Oo. that of a 27-ltea convent ion al test, in 
spite of lower diecriadnating iteas hetng used, on the" average, hy the adaptive 
tests in coBparison to the conventional test. Very few of the recruits had dif- 
ficulty in responding to the emputer-advinistered instructitms on use of the 
testli« tenainals. Analysis showed soae differences in test duration between 
the two to«;:-» strat^les; where they occurred, they were explained hy the 
ability level of tlie exsnittees. I.e., higher ability exaainMS who were adainis- 
tered adaptive tests received «»re difficult Iteae and therefore had signifi- 
cantly longer testlog^laea. Combined with reduced test length for the adaptive 
test to obtain siailar reUabilitles snd validities to the conventional test, 
however, the slight increases observed in adaptive testing tine were negligible. 

The data support the feasibility qf adaptive testing with military recruit popu- 
lations and support theoretical predictions of the psychoaetrlc superiority of 
adaptive tests in comparison with number-correct scored conventional tests. 
(AD A129324) 
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